Modeling segmental durations for Japanese text-to-speech synthesis
نویسندگان
چکیده
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of both consonants and vowels. A Sum-of-Products approach is used to model key interactions observed in the data, and to predict values of factor combinations not found in the speech database. We report overall observed-predicted correlations of 0.88 for vowels (RMSdev: 16.8ms) and 0.94 for consonants (RMSdev: 12.5ms).
منابع مشابه
Modeling vowel duration for Japanese text-to-speech synthesis
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of bot...
متن کاملModeling segmental duration in German text-to-speech synthesis
This paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has been implemented in the German version of the Bell Labs text-tospeech system [18, 12]. The construction of the duration system was made efficient by the use of an interactive statistical analysis packag...
متن کاملDuration modeling for hindi text-to-speech synthesis system
This paper reports preliminary results of data-driven modeling of segmental (phoneme) duration for Hindi. Classification and Regression Tree (CART) based datadriven duration modeling for segmental duration prediction is presented. A number of features are considered and their usefulness and relative contribution for segmental duration prediction is assessed. Objective evaluation of the duration...
متن کاملDuration Modeling For Turkish Text-to-Speech Synthesis System
Naturalness of synthetic speech depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish by using machine-learning algorithms. The models predict phone durations based on attributes such as phone identity, neighboring phone ident...
متن کاملCART-based duration modeling using a novel method of extracting prosodic features
The prediction of accurate segmental durations remains a difficult problem when synthesising speech from text. Inaccurate durations are often perceptually prominent and detract from the naturalness of the quality of speech. For a concatenative system, a statistical approach is an excellent way of predicting segmental durations. More specifically a CART (Classification And Regression Tree) metho...
متن کامل